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Reference points for comparisons of two-dimensional 
maps of proteins from different human cell types 
defined in a pH scale where isoelectric points correlate 
with polypeptide compositions 

A highly reproducible, commercial and nonlinear, wide-range immobilized pH 
gradient (IPG) was used to generate two-dimensional (2-D> gel maps of 
['SJmethionine-labeled proteins from noncuitured. unfractionated normal 
human epidermal keratinocytes. Forty one proteins, common to most human 
cell types and recorded in the human keratinocyte 2-D gel protein database 
were identified in the 2-D gel maps and their isoelectric points {pH were deter- 
mined using narrow-range IPGs. The latter established a pH scale that 
allowed comparisons between 2-D gel maps generated either with other IPGs 
in the first dimension or with different human protein samples. Of the 41 pro* 
teins identified, a subset of 18 was defined as suitable to evaluate the correla- 
tion between calculated and experimental p/ values for polypeptides with 
known composition. The variance calculated for the discrepancies between cal- 
culated and experimental p/ values for these proteins was 0.001 pH units. 
Comparison of the values by the /-test for dependent samples (paired test) 
gave a p-level of 0.49, indicating that there is no significant difference between 
the calculated and experimental pi values. The precision of the calculated 
values depended on the buffer capacity of the proteins, and on average, it 
improved with increased buffer capacity. As shown here, the widely available 
information on protein sequences cannot, a priori, be assumed to be sufficient 
for calculating p/ values because post-translational modifications, in particular 
A-terminal blockage, pose a major problem. Of the 36 proteins analyzed in 
this study. 18-20 were found to be A"-terminally blocked and of these only 6 
were indicated as such in databases. The probability of .V-terminal blockage 
depended on the nature of the A-terminal group. Twenty six of the proteins 
had either M. S or A as A-terminal amino acids and of these 17-19 were 
blocked. Only 1 in 10 proteins containing other A-terminal groups were 
blocked. 



1 Introduction 

As compared wiih carrier ampholyte isoelectric focusing 
<CA-IEF). the application of immobilized pH gradients 
(IPGs i in the first dimension in 2-D gel electrophoresis 
offers improved reproducibility [1] because the nature of 
ihe pH gradient makes the resulting focusing positions 
insensitive to the focusing time [2] and to the type of 
sample applied [3]. The recently introduced ready-made 
IPG strips |4] seem to be an ideal substitute for the car- 
rier ampholyte gradients, which until now have been the 
most commonly used first dimensions in 2-D gel electro- 
phoresis. The availability of standardized first dimen- 
sions opens the possibility of comparing 2-D gel maps of 
various cell types generated in different laboratories, pro- 
vided that the focusing positions of a number of easily 
recognizable polypeptide spots common to the cell types 
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in question are known. Even though this approach is 
limited to experiments performed with the same standar- 
dized IPG. the flexibility provided by IPGs allows the 
pH gradient to be adjusted to the requirements of a par- 
ticular experiment. 

Exchange and communication of 2-D gel protein data re- 
quires a pH scale that is independent of the particular 
IPG used and by which the results can be described. The 
introduction of carbamyiation trains and the relation of 
focusing positions to the spots in these trains repre- 
sented a step forward towards solving the reproducibility 
problem experienced with carrier ampholyte focusing (5). 
Problems associated with the use of carbamyiation trains 
were mainly due to lack of temperature control and to 
the use of nonequilibrium focusing conditions. Accord- 
ingly, the pattern variation involved not only the re- 
sulting pH gradients, but also the relative spot positions 
as related to each other and to spots in the carbamyia- 
tion trains. Even though the question of reproducibility 
has. to a large extent, been solved, the carbamyiation 
trains are still not ideal as markers because the spots in 
the trains do not represent defined entities but rather a 
large number of differently carbamylated peptides 
having close p/ values. As a result, the spots are large 
and poorly defined as compared to the ordinary polypep- 
tide spots in 2-D gel maps. 
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Neidhardt etai. (6] defined the pH gradient in 2-D gel 
experiments by pi markers whose p/ values were calcu- 
lated from the amino acid composition. Focusing posi- 
tions of other polypeptides could be predicted from their 
composition but the pA' values needed for the p/ calcula- 
tions were unknown. Various groups employing this 
approach do not use the same pK values [6. 7] and there- 
fore, the pi values derived in this way cannot be 
expected to describe the variation of the hydrogen ion 
activity. In spite of this fact, it is still possible to make 
approximate predictions of focusing positions because 
the pK values used to define the pH gradient are also 
used to calculate pi values and to predict the focusing 
positions. Errors in pK assignments are therefore com- 
pensated. A pH scale which corretly reflects the variation 
in hydrogen ion activity during focusing should improve 
the precision of the predictions, but this has never been 
implemented with CA-IEF focusing as a first dimension 
in 2-D gel electrophoresis. The main reason for this are 
the problems associated with pH measurements in 
focused gels containing high concentrations of urea. 

IPGs can be described from the concentration variation 
of the immobilized groups, provided that the pK values 
of these groups are known for the conditions prevailing 
during focusing. To avoid measurements on gels, Gia- 
nazza eial. [8] suggested the use of pK values derived by 
addition of determined pA' shifts. Recently, direct deter- 
minations of pK differences between immobilized 
groups in IPGs were made by determining p/-pA values 
in overlapping narrow-range IPGs [9, 10] and the results 
verified the applicability of the Gianazza approach. A 
description of the focusing results in a pH scale, which 
correctly describes the variation of the hydrogen ion 
activity for the focusing conditions used, not only allows 
the comparison of 2-D gel maps generated with different 
IPGs, but also opens the possibility for correlating the 
focusing position of a polypeptide with its composition 
[9]. Experiments by Bjellqvist etai [9, 10] have implied 
that pH scales showing good correlation between calcu- 
lated and experimental pi values can be derived for any 
of the conditions commonly used for focusing in connec- 
tion with 2-D gel electrophoresis. These pH scales are 
then defined through the pK values of the immobilized 
groups in the IPG containing gel. To be useful for imer- 
laboratory comparisons, however, the pH scale has to be 
defined through pi values of easily recognizable spots 
present in the 2-D gel map. So far, pi determinations in 
a useful pH scale, combined with determinations of pK 
values needed for pi calculations, have only been made 
for the pH range 4.5-6.5 at 10 °C [9]. CA-IEF focusing as 
described by OTarrell [11] does not control the tempera- 
ture of the first dimension, which can be expected to be 
slightly above room temperature. With IPGs, the temper- 
ature commonly used is about 20°C [4, 12] or 25 °C [13] 
and this is a critical parameter that needs to be con- 
trolled [14]. 

The present work was designed to compare 2-D gel maps 
of different cell types in a laboratory applying both 
CA-IEF and IPG focusing at a common temperature. To 
this end we have generated 2-D gel maps of proteins 
from noncultured, unfractionated normal human epi- 
dermal keratinocytes with IPG in the first dimension 



and a focusing temperature of 25C. We hj\e usee com- 
mercial nonlinear, wide-range IPG strips which gi\; :-D 
gel maps that are closely similar to the ones resuinr.i: 
with the CA-IEF technique used to establish iht human 
keratmocyte database [15]. .As an initial sier towards 
interlaboraton comparisons of results obtained with the 
nonlinear gradient as a first dimension we report here 
on the focusing positions of 41 known proteins that are 
common to most human cell types. The pH range 
covered corresponds to the range in classical CA-IEF 
2-D gel electrophoresis and in order to use these pro- 
teins as internal standards for comparing 2-D gel maps 
generated with other IPGs we determined their pi values 
with narrow-range IPGs in the first dimension. We have 
compared the calculated versus experimental pi values 
and show that it is necessary to have further information 
(absence or presence and nature of posttranslanonal 
modifications), in addition to amino acid composition to 
be able to calculate pi values thai correspond to the 
actual experimental values. The pA* values used for the 
calculations are provided and the usefulness of pi predic- 
tion in relation to database information is discussed. 
Furthermore, we comment on the possibility of using 
experimentally determined pi values to verify the avail- 
able database information on polypeptide composition. 

2 Materials and methods 

2.1 Apparatus and chemicals 

Equipment for isoelectric focusing and horizontal SDS 
electrophoresis (Multiphor 1 II electrophoresis chamber. 
Immobiline v strip tray. Muliidrive XL programmable 
power supply. Macrodrive power supply and Muhiiemp* 
II) was from Pharmacia LKB Biotechnology AB 
(Uppsala. Sweden). Vertical second-dimensional gels 
were run in the home-made equipment deccribed in |15]. 
The IPG strips with the wide-range nonlinear pH gra- 
dient were either lmmobiline DryStrip v pH 3-10 NL 
180 mm or alternatively 160 mm long IPG strips with a 
corresponding pH gradient. In both cases the IPG strips 
were delivered by Pharmacia LKB. lmmobiline. Pharma- 
lyte. Ampholine. GelBond as well as PAG film and the 
ready-made horizontal SDS eels (ExcelGeP XL SDS 
12-14) were also from Pharmacia LKB. Purified proteins 
and peptides were from Sigma (St. Louis. MO). 

2.2 Sample preparation 

Preparation and labeling of unfractionated keratinocytes 
as well as fibroblasts have been described in [16]. Cells 
were lysed in a solution containing 9.8 m urea. 2% w/v 
NP-40, 100 mM DTT and 2% v/v Ampholine pH 7-9. 

23 2-D gel electrophoresis 

First-dimensional focusing was performed according to 
Gorg etai [2] with some minor modifications, as de- 
scribed in [9]. Rehydration of the IPG strips was made 
in a solution containing 9.8 m urea, 2% w/v CHAPS, 10 
mM DTT and 2% v/v carrier ampholyte mixture. The car- 
rier ampholyte mixture consisted of 2 parts Pharmalyte 
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4-6.5. 1 pan Ampholine pH 6-8 and 1 pan Pharmalyte 
pH. 8-10.5. Usually, caihodic sample application was 
used and the samples were diluted 2-20 times in a solu- 
tion containing 9.8 m urea. 4 c o w/v CHAPS. l°c w/v 
DTT and 35 m.\i Tris base. For acidic application, the 
Tris-base was substituted with 100 m.M acetic acid. The 
degree of dilution and sample volume (20-100 uL) 
depended on the particular sample and the IPG. and 
whether visualization of the proteins was to be done by 
Coomassie Brilliant Blue or silver staining. With the 
wide-range non-linear IPG, 10-30 ug of total protein 
was loaded for silver staining and 100-200 ug for Coo- 
massie staining. Focusing was done overnight with Vh 
products in the range of 45—60 kVh with 160 mm long 
strips and 50-70 kVh with 180 mm long strips. Solubili- 
zation of polypeptides and blocking of -SH groups prior 
to the second-dimensional run. as well as loading on the 
second-dimensional gel was done as described in [9]. 
The slacking gel was omitted and 5—10 mm were left at 
the top of the second-dimensional gel for applying the 
IPG strip. The space was filled with electrode buffer con- 
taining 0.5 °o w/v agarose. Casting, running, staining and 
autoradiography were carried out as described in [15], 

2.4 Experimental determination of p/ values 

The determination of the pA* differences between Immo- 
bilines pA' 4.6. pA' 6.2 andpA' ".0 necessary for the cali- 
bration of the pH scale at 25 C in 9.8 m urea was done 
as described in [9] with the same narrow-range IPGs. 
The pH scale was defined by setting the pA' value of 
Immobilinc pA' 4.6 equal to 4.61 [9] and the determined 
pA* differences cave the pA' values of Immobilines pA' 6.2 
and pA *\0. equal to 5.73 and 6.54. respectively. The pA' 
differences found arc in good agreement with values de- 
rived from [17] and [81 by extrapolation to 9.8 m urea 
concentration. As in [9). additional narrow-range recipes 
have been used for determining p/ values. With narrow- 
ranue IPGs extending to pH values higher than the pA* 
value of Immobiline pA" "\0. anodic sample application 
was uved with acetic acid added to the sample solution. 
Otherwise, cathodic sample application was used with 
the Name sample butTer as for wide-range IPGs. 

2.5 Protein compositions used for p/ calculations 

With the exception of vimcntin. protein compositions 
arc from the Swiss-Prot database [18]. For vimentin. we 
used the data from [191. where the amino acid at posi- 
tion 41 is a D instead of a S. Information in the Swiss- 
Prot database on phosphorylation has been disregarded 
because it was known from earlier studies (J. E. Celis, 
unpublished results) that the spots in question corre- 
sponded to the unphosphorylated forms of the peptides. 
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different substituents on the c-carbon were taken m;o 
account. The calculations of p/ values were made \w:r> 
the aid of the IPG-maker program (20). 

2.7 pA" values used for pi calculations 

For the carboxyl terminal group 3nd internal glutamyl 
and aspartyl residues the same pA* values were used as in 
[9]. For C-ierminal glutamyl and aspartyl residues, sep- 
arate pA* values were derived with the aid of the Tan 
equations [9. 21). The pA' values of histidyl groups were 
calculated from the p/ values of human carbonic anhy- 
drase 1 as in [9). For A'-terminal glycine a pA* value of 
7.50 was used. The pA* shift caused by a substituent on 
the c-carbon was assumed to be identical with the pA 
shift the substituent caused for the amino group in the 
amino acid. i.e. 2.28 pH units were subtracted from the 
pA' values for the amino groups in the amino acids given 
in (22. 23]. The approximate pA' value of 9 for the cys- 
tenyl group was taken from [24J. For tyrosyl and arginyl 
groups we used the pA' values for the amino acids [22. 
23). For lysyl groups the effect of high urea concentra- 
tion on amino groups was taken into account and 0.5 pH 
units were subtracted from the amino acid pA* value. 
These last three pA' values are far from the pH range 
under study and the results found would have been the 
same if lysyl and arginyl groups were assumed to be 
fully ionized while the ionization of tyrosyl groups were 
neglected. A complete list of the pA* values used is given 
in Table 1. 



Table I. pA' Values used for the iomzable groups in peptides 
9.8 m urea. 25 U C 



Iomzable 


pA" 


group 




C-ierminal 


3.55 


V-ierrmnal 




Ala 




Met 


•.on 


Ser 




Pro 




Thr 


h.s: 


Vai 


".44 


Gtu 


".70 


Internal 




ASP 


4 05 


Clu 


4.45 


His 


5.98 


Cys 


y 


Tyr 


10 


Lys 


10 


Arg 


12 


C-iermmal side chain groups 




Asp 


4.55 


Giu 


4.75 



2.6 Calculation of pi values 

For the p/ calculations it was assumed that the same pA' 
value could be used for an amino acid residue in all 
polypeptides and in all positions in the peptide except 
for A- or C-terminally placed amino acids. For the pA' 
values of the A'-ierminal amino groups the effect of the 



2.8 Statistical analysis 

Statistical comparisons of the experimental and calcu- 
lated p/ values were done on an Apple Macintosh Ilsi 
using the statistical package Statistica/Mac. release 3.0b 
(from StatSoft Inc.. Tulsa. Oklahoma). Calculated and 
experimental p/ values were compared by the /-test for 
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correlated samples (paired /-test). The normality of p/ 
differences was estimated graphically by probability 
plots. The variances of the data presented here and the 
similar data on plasma and liver proteins in (9) were 
compared by the F-test. 

3 Results and discussion 

3.1 Identification of polypeptides and pi determinations 

The 2-D gel maps of ["S]methionine*Iabeled proteins 
from noncultured. unfractionated normal human kerati- 



IEF- 



cn 
Q 

CO 

i 



nocytes. focused with the nonlinear, widi-nnct IPG 
CA-IEF pH gradients in the first dimension.* are >rvu-. 
in Figs. 1 and 2. respectively. The IPG ex:er:c> to r.i$r.c 
pH values but otherwise the two patterns are \er> \--.\- 
ilar and most of the spots in the IPG pattern jjr. re 
directly related to the corresponding spots in :r.: 
CA-IEF gel. To obtain comparable patterns it * as imrv-- 
tant to keep the focusing temperature as similar a> 
possible. Compared to other studies (1-4. 9. 10. 12- UJ. 
we increased the urea concentration in the focusing gel 
to 9.8 m because keratins streaked badly m the focusing 
dimension when 8 m urea was used, presumably due to 
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f/ei/rr /. :-D gel proiein map of [ J? Slmethionine-labeled proteins from nonculiurcd. unfractionated normal human keratinocyies focused wiih 
the nonlinear wide-range IPG in the first dimension. The position of the 4| proteins analyzed in this study is indicated. 
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aggregates of acidic and basic keratins. An increase in 
urea concentration to 9 m or more eliminated these 
streaks; apart from this effect, no other major changes in 
the focusing positions were observed. In Fig. 1 we have 
indicated the positions of 41 known proteins from the 
human keratinocyte 2*D gel database that are most 
likely common to most human eel! types. The choice 
was made because these proteins are easy to identify 
with certainty. With the exception of stratifin (spot 2). 
involucrin (spot 4) and keratin 14 (spot 15). which are all 
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epithelial markers, these proteins are also presen: ir. 
human fibroblasts (Fig. 3) and lymphocytes (results no: 
shown), and therefore can be used as landmarks for com- 
paring 2-D gel maps derived from different cell types. In 
Table 2 the 41 proteins are listed together with the:: 
sample spot numbers (SSP) in the human keratinocyte 
protein database and p/ values determined in 2-D gel 
maps generated with narrow- range IPGs in the first 
dimension. 




f/iw \ :-D eel protein map of [ ?5 S|methionine-labeIed proteins from noncuhurcd. unfracuonated normal human keraitnocvies focused with 
CA-IEF m the first dimension. The position of the 41 proteins analyzed in this study is indicated. 
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12 Comparison between the determined and calculated 
• pi values for human keratinocyte proteins 

Thiny six of the 41 proteins listed in Table 2 are found 
in the Swiss-Prot database. Contrary to the plasma and 
fiver proteins used in [9], the p/ calcuations on the pro- 
teins used in this study posed some problems that 
reflected the way in which they were characterized. The 
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proteins used by Bjellqvist et ai [9] w er; t . x »„ 
abundant and well-characterized plasma protege: ^ 
were identified by A-terminal sequencing and. the-erb- 
the nature of the A-terminais (acetylated or non-acetC- 
lated) was in both cases known. The proteins used in 
this study have all been characterized bv internal 
sequencing [7] and it is known that A-terminal acen la- 
tion occurs with high frequency in eukarvoies. 
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According to Brown and Robert [25J* proteins with acety- 
lated A-terminals correspond in weight to approximately 
80% of the soluble protein in ascites cells. Based on 
results from A-terminal sequencing, at least 40% of the 
spots in the human liver protein 2-D gel map appear to 
be blocked [3J. The corresponding number, derived from 
107 spots in the 2-D gel map of human T-lymphocyie 
proteins, falls between 60 and 65% (J. Strahler. personal 
communication). Information concerning A-terminal 
blockage is not normally available, and in the Swiss-Prot 
database only 6 of the 36 keratinocyte proteins are speci- 
fied as A'-terminally blocked. We have, within the present 
material, defined 18^proteins for which the A-terminals 
are very likely to be correctly described. Six of these pro- 
teins are listed in the Swiss-Prot database as A-termi- 
nally blocked, four represent proteins which appear in 
the human liver 2-D gel map and have been A-termi- 
nally sequenced as liver proteins [3] and the remaining 
eight have A-terminal groups other than M. S and A. /.e\ 
.V-terminals for which A'-acetylation is uncommon [26). 
In Figs. 4 A. B. C and D p/ values calculated from Swiss 
Proi database information are plotted against the experi- 



mentally determined p/ values for all the kerat:r.,v.:^ 
proteins listed in Table 2 and for the IS sei^tec pro- 
teins, as well as for the plasma and liver protein* i 
from [9] valid for 10 °C)*. 

The calculations show that without knowledge of the 
status of the A-terminal group, precise predictions of p/ 
values for eukaryotic proteins cannot be achieved based 
on the information available in Swiss-Prot and similar 
databases. However, for proteins where the A-terminal 
status is known, we find good correlation between pre- 
dicted and experimental p) values. When the variance of 
the p/ discrepancies and the variance of calculated 
charges at the experimental pJ values derived from the 
present data set are compared wiih the corresponding 



• There are lour plois: 1A1 the 5o polypeptides t>om normal human 
keratinocytes ino corrections I. (B) the 3o poh peptides from Fie. • \ 
where p/ values have been recalculated Tor i: polypeptides »uh M. 
S and A as .V-terminally assumed blocked, based on calculated 
charge. iC) the 18 selected polypeptides with information on the 
\-ierminal configuration, and <D> plasma and hver proteins 



Future 4 Calculated vs. experimental p/ values. Lines are fitted using the least squares' criterion. (A» 36 polypeptides from normal human kerati* 
nocytes (no corrections). (B> 36 polypeptides from Fig. 4A (including the 18 marker polypeptides) where p/ values have been recalculated 
assuming A-terminal blockage: x indicates recalculated p/ values: nucleolar protein B23 is indicated with an arrow. (C) 18 polypeptides with infor- 
mation on A-ierminal configuration and tD» plasma and liver proteins. 
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• alues derived from the data on plasma and liver pro* 
.eins in (9) (Table 3). the present data are found to result 
:n larger variances for the values of both p/ discrepancies 
and calculated charge at the experimental p/ value when 
no information on posttranslational modification is 
.aken into consideration. Correction for possible .V-acety- 
:ation of 12 polypeptides with M. S and A as \-terminal 
results in a smaller variance of pi discrepancies, al- 
though not significantly different from values derived 
from (9). whereas the variance of the calculated charge at 
:he experimental p/ value is significantly higher. For the 
18 selected proteins the variance for the pi discrepancies 
s significantly smaller than for the data in [9]; however. 
:he corresponding value for calculated charge at the 
experimental pi value does not improve to the same 
extent. This, we believe, reflects another difference 
between the two sets of proteins used for the calcula- 
ions. Based on spot distributions in 2-D gel maps, the 
;et of proteins used here has a molecular weight distri- 
bution that is more representative of the patterns ob- 
served in mammalian cells. In the study by Bjellqvist 
?tai [9] most of the high molecular weight plasma pro* 
eins had to be excluded due to their unknown content 
)f sialic acid which made the proteins analyzed in this 
;tudy heavily biased towards low molecular weight pro- 
eins. The buffer capacity of proteins normally increases 
vith the protein's molecular weight, and the average 
)ufTer capacity of the presently selected proteins with 
issumed known A'-terminals is 18 charge units/pH unit, 
vhile the corresponding value for the proteins used in 
9) is only 9 charge uniis/pH unit. High bufTer capacity 
:an be expected to improve the agreement between cal- 
:ulaied and experimental pi values, inspection of the 
lata presented in Table 2 for the polypeptides with 
issumed known ;V-ierminals verifies the importance of 
he bulTcr capacity. For 8 polypeptides having buffer 
rapacities higher than 15 charge units/pH unit, the calcu- 
ations in all cases yielded pi discrepancies with absolute 
aiues of less than 0.02 pH units. The largest discre- 
lancy. 0.06 pH units, was observed for annexin II and 
tathmin. proteins which have low buffer capacity: 0.9 
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and 6.6 charge units/pH unit, respectively. The pro?u- 
bility that the focusing position of a protein with known 
composition will fall within a cenain distance from the 
calculated pi value therefore cannot be predicted by in; 
variance alone. The buffer capacity of the specific protein 
must be taken into consideration as well. As indicated 
by the decrease of the variance of calculated charges at 
the experimental pi value for the selected proteins, the 
observed improvement can not solely be due to the 
higher buffer capacity of the keratinocyte proteins. The 
two studies relate to different experimental conditions. 
Good agreement between experimental and calculated 
pi values implies that the proteins are defolded and a 
factor that may contribute to the observed improvement 
is a more complete defolding of proteins caused by the 
higher temperature and urea concentration used in this 
study. 

The data indicated that the precision with which pi 
values can be predicted for polypeptides with high buffer 
capacity is better than the precision with which experi- 
mental pi values can be determined. If the pH is defined 
through the pA' values of the immobilized groups in the 
IPG containing gel. the precision of the experimentally 
calculated data will depend on the pH difference 
between the pi and the pA' value of the immobilized 
group with the closest pA'. For the present study this will 
give pi determinations with a precision varying in the 
range of ± 0.02-0.05 pH units [9]. The good agreement 
observed between the calculated and experimental pi 
values is due to the fact that errors are mainly system- 
atic and. as discussed in [9], they will largely be cancelled 
out in the calculations. A pH scale defined through the 
presently determined pi values will not necessarily 
reflect the variation of the hydrogen ion activity during 
the focusing step in an optimal way. but it still allows 
precise predictions of focusing positions for polypeptides 
with known compositions, including information on 
posttranslational modifications. Calculated net charge at 
the experimentally found isoelectric point defined in this 
scale will serve as a tool to verify that the polypeptide 
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composition used in the calculation is correct and com- 
plete. Exceptions to this are proteins such as involucrin 
and heat shock protein 90 that have very high buffer 
capacities. Introduction of an extra charge unit into 
these proteins will only result in p/ shifts falling in the 
range of 0.01-0.02 pH units and the effect is that the 
quality of the pH definition - the precision by which pA' 
values used in the calculations are given and the preci- 
sion of experimental pi values in these cases — will limit 
the possibilities to verify polypeptide compostion based 
on the experimental p/ value. 

Statistical comparison^of-experimental and calculated^/ 
values was done using the /-test for dependent samples 
and normality of the discrepancies was estimated by 
probability plots. For the 36 proteins, the p-level is 
0.0021. indicating that a result like this is unlikely to 
be a chance effect and must be assumed to represent a 
real difference. After correction for the most likc!> 
A'-terminal configuration, the p-level is 0.043 and cannot 
be accepted as representing the same population since 
the p-level is less than 0.05 - the traditional Himit of 
statistical significance. For the 18 proteins with a known 
or very likely A-terminal configuration the /-test gave a 
p-levei of 0.49. which verifies that the experimental and 
calculated pi values are not significantly different. 

Besides showing that p/ values for denatured proteins 
with known compositions can be calculated with a high 
degree of precision from average pA' values, the results 
also provide strong support for the notion that 
A'-terminal blockage heavily depends on the nature of 
the A'-terminal groups [261. The results seem to indicate 
that with A'-terminals other than M. S and A. only a few 
proteins have blocked A-terminals (1 out of 10 proteins 
in the present study), while it can be inferred from the 
data presented in Table 2 that a majority of the proteins 
with M. S and A us A'-terminal are blocked. After correc- 
tion for the effect of suspected A-terminal blockage 
there is only one protein (nucleolar protein B23) out of 
the 36 used in this study, which, in spite of a high buffer 
capacity, has a marked difference of 0.11 pH units 
between predicted and determined p/ values (Fig. 4B); 
this corresponds to 3 charge units due to the high buffer 
capacity of this protein. This discrepancy in p/ prediction 
and calculation of net charge at the p/ is probably not 
due to deficiencies in the database information but 
instead retlects a shortcoming of the model used for p/ 
calculations. Nucleolar protein B23 contains a domain 
extremely rich in aspartic and glutamic acid residues 
(Table 4). in which 26 out of 28 amino acid residues 
from position 161 to 188 are either a D or an E. A calcu- 
lation based on the use of average pA' values unin- 
fluenced by the charged neighboring amino acid resi- 
dues cannot be expected to correctly describe the p/ 
value with almost half of the acidic groups packed 



Table 4. Amino acid sequence of nucleolar phosphoprotein D23 
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together into a highly negatively charged reg:^- Th> 
limitation caused by calculations based on averse r.\ 
values does not severely limit the usefulness o: 
approach since a search through Swiss-Prot snow< 
this type of D/E-rich motif is uncommon, and :ne ev>- 
tence of a highly charged region ts immediately apparer.: 
upon inspection of the ammo acid sequence. 

The quality of the information available in databases, 
especially concerning posttranslational modifications, is 
a major problem when the data is to be used for p/ pre- 
dictions. The Hevel of 0.043 found for all 36 proteins 
after correction for V-acetylation. shows that this prob- 
lem is not only limited to \-terminal bloekjge an J the 
very good agreement found for the eighteen poi> pep- 
tides, with assumingly correctly described .V-iermm.il 
(Fig. 4C). must be regarded as an exception from this 
point of view. A-Terminai blockage is generally the main 
problem in relation to p/ predictions for eukaryotic pro- 
teins. Of the 36 keratinocyte proteins analyzed. IS— 20 
are suspected to be A -terminally blocked it proteins blo- 
cked according to Swiss-Prot. 12 proteins with M. S or A 
as A'-terminal and assumingly blocked based on the cal- 
culated charge, and two proteins, involucrin and 
nucleolar protein B23. with M as A'-terminal for which 
the data does not allow any conclusion). This is in rea- 
sonable agreement with the conclusions based on the 
A'-terminal sequencing data derived in connection with 
2-D gel electrophoresis. A'-terminal blockage can be sus- 
pected for 17-19 of the 26 proteins with M. S or A as 
A'-terminal, while only 1 in 10 proteins with other 
A-terminal groups are blocked. The information that the 
frequency of A'-terminal blockage is strongly related to 
the nature of the A-terminal group will be of some help 
in connection with p/ predictions based on database 
information. However, without information from other 
sources, an uncertainty will always remain as to whether 
the A'-terminal charge should be included in the pi calcu- 
lation. 



4 Concluding remarks 

The data presented here lays the foundation for com- 
paring 2-D gel protein maps of different cell types gener- 
ated with nonlinear, wide-range IPGs in the first dimen- 
sion. The focusing positions of 41 polypeptides common 
to most human cell types have been described in a pH 
scale that allows focusing positions to be predicted with 
a high degree of accuracy, provided that the composition 
of the polypeptides are known and that information on 
posttranslational modifications are available. For poly- 
peptides with a very high butter capacity, the limiting 
factor is the precision with which experimental pH 
values can be determined rather than the precision of 
the calculations. Possible deficiencies in the pH scale 
description of the variation of the hydrogen ion activity 
has. at least at the present state, no consequences for its 
practical use. The major limitation in connection with 
predictions of focusing positions from polypeptide com- 
positions is the quality of existing data on protein com- 
positions, especially concerning posttranslational modifi- 
cations. Amino acid sequences have been reasonably 
easy to obtain, while posttranslational modifications 
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etc 



have been difficult and work-intensive to determine. 
Recent developments in the field of mass spectrometry 
are fast changing this situation and within the next years 
we can expect a surge m reliable data in this area. While 
awaiting this development, verification of correctness 
and completeness of available information on polypep- 
tide composition can be provided by experimental p/ 
values in a pH scale based on the pi values determined 
in this study. So far. our data cover the pH range below 
pH 7.5. The basic pH range covered by NEPHGE as 
first dimension will be covered in forthcoming work. 
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